Scale-free download network for publications 
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The scale-free power-law behavior of the statistics of the download frequency of publications has 
been, for the first time, reported. The data of the download frequency of publications are taken from 
a well-constructed web page in the field of economic physics (http://www.unifr.ch/econophysics/). 
The Zipf-law analysis and the Tsallis entropy method were used to fit the download frequency. It 
was found that the power-law exponent of rank-ordered frequency distribution is 7 ~ 0.38 ± 0.04 
which is consistent with the power-law exponent a ~ 3.37 ± 0.45 for the cumulated frequency 
distributions. Preferential attachment model of Barabasi and Albert network has been used to 
explain the download network. 

PACS numbers: 89.20.Hh, 89. 75. He, 89.75.Da 



Recently the complex network has become one of the 
hot research fields, especially for its feature of statistical 
mechanics. The rapid growth of the internet stimulates 
physicists to investigate the rules of network. In a pi- 
oneering work of Barabasi and Albert, they found that 
the degree of node of Internet routes, URL (universal re- 
source locator) - linked networks in the WWW (World- 
Wild Web) satisfies the power-law distribution pHg, also 
called as the scale-free networks. 

The power-law behavior of rank distribution is believed 
to be related to Zipf's law, which was found by Zipf in 
the early of the last century 0. Originally, Zipf made 
his remarkable observations about some basic linguistic 
laws. More precisely, if we order the words appearing in a 
text from the most to the less frequent ones, we can plot 
the number of times of those words appear as a function 
of the rank. Zipf shows that, excepting the words with 
extremely low rank, an inverse power law emerges (so 
called Zipf's law). That is, the frequency x, 



x ~ Rank 7 , 



(1) 



where the 7 is an Zipf law exponent. Zipf's law or scale 
free networks is different from the predictions of pure ran- 
dom networks introduced by Erdos and Renyi |5j . For the 
former, Barabasi and Albert proposed a preferential at- 
tachment model (BA model) to give the scale-free law of 
the link of Internet Q and Tsallis explained the statis- 
tical feature of complex network using an non-extensive 
entropy (known as Tsallis' entropy 0) approach Q. The 
original BA model predicts the probability distributions 
p(k) ~ k~ a , where k is the degree of network node and 
a — 3 (the corresponding rank-ordered law yielding 7 
= l/(a -1) = 1/2 0). Extended and modified models 
based on the BA model have been developed in order 
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to obtain a = 2 — 4 more precisely to fit realistic sys- 
tems Recently complex networks and/or Zipf law 
has been explored in very broad fields of science, which 
include physics, electronics, compute sciences, geology, 
sociology, economics, linguistics, biology and many oth- 
ers. For instance, the complex networks have been ob- 
served for WWW and Internet, movie actor collabora- 
tion network, science collaboration graph, cellular net- 
works, ecological networks, phone call networks, citation 
networks, networks in linguistics, power and neural net- 
works, protein folding and interaction network, earth- 
quake network, firms growth and bankruptcy and gene 
expression (for a review, please see even f° r the 
fragment hierarchical distribution in the nuclear dissoci- 
ation ^3 an d the hadronic production process ^| e t c - 

The scale-free networks related to scientific publica- 
tions have been also explored, it was shown that the 
citation network of scientific references 0, Q and the 
collaboration graph of the co-authorship of publications 
[l3| satisfies the power law distribution for rank-order 
distribution. Redner exhibited and discussed the distri- 
butions of citations related to two quite large data sets, 
namely (i) 6 716 198 citations of 783 339 papers, pub- 
lished in 1981 and cited between 1981 and June 1997, 
that have been catalogued by the Institute for Scientific 
Information (ISI), and (ii) 351 872 citations, as of June 
1997, of 24 296 papers cited at least once and which were 
published in Physical Review D (PRD) in volumes 11 
through 50 (1975-1994). In his study, Redner addressed 
the citations of publications, in variance with Laherrere 
and Sornette who addressed, in a similar study, the 
citations of authors. If we denote by x the number of ci- 
tations and by N(x) the number of papers that are cited 
x times. The main results of the study were that, for 
relatively large values of x, N(x) oc l/x a with a ~ 3, 
whereas, for relatively small values of x, the data were 
reasonably well fitted with a stretched exponential, i.e., 
N(x) oc exp[— {x/xq)P\, /3 and xq being the fitting pa- 
rameters {p ~ 0.44 and 0.39 for the ISI and the PRD 
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FIG. 1: The distribution of the 100 top download papers till 
03/28/2004 ( http://www.unifr.ch/econophysicsl. x represents the 
download frequency and N(x) the numbers of the papers which has 
been downloaded for x times. 



data respectively). 

To help expose these differences in the citation distri- 
bution, Redner constructed the Zipf plot Q , in which the 
number of citations of the fcth most-ranked paper out of 
an ensemble of M papers is plotted versus rank k. By this 
definition, the Zipf plot is closely related to the cumula- 
tive large-x tail of the citation distribution and hence it 
is well suited for determining the large-x tail of the ci- 
tation distribution. The integral nature of the Zipf plot 
also smooths the fluctuations in the high-citation tail and 
thus facilitates quantitative analysis. For the mentioned 
data set above, he found that the Zipf law exponent 7 
(see Eq. (1)) close to 1/2, which is consistent with power 
law exponent a = 3(a = l + l /j) for the distribution of 
citations. 

In this work, we report that the rank-ordered down- 
load frequency of the papers in a web page can also be 
described by the Zipf law. The data set we are using 
here comes from a well constructed web page Q in the 
field of economical physics (so-called econophysics) by Y. 
C. Zhang since 1998. The scale free download network 
is explored and the quantitative information about this 
complex network has been extracted by the Zipf law and 
Tsallis' non-extensive entropy. The preferential attach- 
ment network model of Barabasi and Albert is used to 
explain the mechanism of network. 

In terms of the frequency of the download of the pa- 
per, the rank can be defined from the most downloaded 
paper (rank = 1) to the less downloaded paper. The dis- 
tribution of the downloaded frequency is shown in Fig.^ 
Roughly speaking, the download distribution can be fit- 
ted by the power law distribution: N(x) ~ x~ a . The last 
few points have a large fluctuation beyond the good fit 
below x ~ 1000. The extracted exponent a ~ 3.37±0.45. 

Since we have values of the rank-ordered frequency, we 
can make the Zipf plot for the download distribution. 
Fig-EIa) shows 12 Zipf plots for 12 selected dates which 
are represented by the different symbols which are for- 



matted by Year. Month. Date. The time of these plots 
span from 28 Sept 2003 to 28 March 2004. In order to 
minimize the fluctuation of these plots due to the statis- 
tics and network growth, we average 12 data set points 
and make the Zipf plot in Fig. H[b). Zipf power law 
(Eq.(l)) has been used to fit the Fig. 2(b) and the ex- 
tracted exponent 7 ~ 0.38 ± 0.04. This value leads to 
a = I + I/7 = 3.63 ±0.28 which is in a reasonable agree- 
ment with a = 3.37±0.45 from the download distribution 
of Fig. It is in the range of 2 - 4 for various realistic 
networks [l0|. 

On the other hand, Tsallis proposed a reasonable ex- 
planation Q and well fitted real data sets by the non- 
extensive entropy theory Q. In the non-extensive en- 
tropy theory, the probability distribution function is 
given the expectation being constant, as follows: 



p(x k ) 



1 



[l + {q-l)\{x k ~ (x k ))} — 



(2) 



where (x k ) denotes the mathematical expectation of x k ; 
A is the factor similar to Lagrange multipliers, and q 
is the characteristic parameter related to the exponent. 
When q approaches to 1, Tsallis entropy becomes the 
Boltzmann-Gibbs entropy and p(x k ) approaches to an 
exponential distribution function. 

In the rank-ordered statistics, the rank of x k , 
Rank(xk), and the cumulative distribution function are 
equivalent to a simple relation, it reads 

/■OC rx h 

Rank(xk) oc / p(x)dx = 1—1 p(x)dx. (3) 

Jx h Jo 

From integrating p(x) in Eq.(2) one can obtain the fol- 
lowing result: 



Rank oc T [l + (q - l)X(x - (x))] ^ 
A 

or representing 1 as a function of Rank yields 



x + 



(Rank — Rankps) q 



(4) 



(5) 



where £0 and b are fitted parameters, and parameter 
Rankps is introduced here to take the finite size effect 
into account. 

By using Eq. (4) we fitted the data of average download 
frequency (Fig. 2(b)) and extract the parameter q and 
Rankps- The dotted line represents this fit with the 
parameter q = 1.351 ± 0.006 and Rankps = 0.60. From 
q we deduce = 3.846 ± 0.068. This exponent is very 
close to the exponent of a = 3.37 ± 0.45 in Fig.^as well 
as a = 3.63 ± 0.28 deduced from the 7 value of the Zipf 
law fit to the rank-ordered distribution. 

The scale-free characteristic of statistics of the down- 
load frequency could be interpreted by the B A model |(| . 
In the linear BA model, the growth of the network can be 
constructed by two steps. Firstly, the earlier growth pro- 
cess: starting with a small number of vertices (download 
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FIG. 2: The rank-ordered (Zipf-type) plot for the download fre- 
quency of http://www.unifr.ch/econophysics web page. The sym- 
bols are illustrated in figure. See text for details. 



papers) which visitors are interested in them from a huge 



references of econophysics web page, at every time step 
some visitors add some new vertices and a rank web page 
of the vertices was initially constructed. Secondly, the 
preferential attachment process: each visitor of econo- 
physics web page can freely access the rank web page of 
the papers. The higher the rank of the downloaded pa- 
pers, the more probability a visitor would like to down- 
load, and the more frequency this leads to in statistics. 
In this mechanism of BA model, it is natural that such 
a kind of preferential attachment process will result in a 
power-law or Zipf's law distribution of the downloaded 
frequency. 

In conclusion, the scale-free power-law behavior has 
been, for the first time, observed in the download fre- 
quency distribution of the papers in an econophysics 
web page. From the download frequency distribution, 
it can be described by the power-law with the exponent 
a = 3.37 ± 0.45 which is consistent to the description of 
Zipf law for the rank-ordered download frequency with 
a scale-free power law exponent 7 ~ 0.38 ± 0.04. This 
Zipf law parameter is not far from the exponent from the 
rank-ordered citation distribution 0. It may indicate 
of a similar mechanism for both networks, which can be 
explained by the preferential attachment process of BA 
model. On the other hand, the download frequency is 
also considered in the framework of the non-extensive 
Tsallis' entropy theory, which gives us the non-extensive 
Tsallis' entropy index q = 1.351 ± 0.006 and leads to 
— c± -r = 3.846 ±0.068, which is also in well agreement with 

q— 1 ' G 

the above a parameter. 
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